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An automatic segmentation system distinguishes foreground 
and background objects by first encoding and decoding a 
first image at a first time reference. Macroblocks are 
extracted from a second image at a second time reference. 
The macroblocks are mapped to pixel arrays in the decoded 
first image. Frame residuals are derived that represent the 
difference between the macroblocks and the corresponding 
pixel arrays in the previously decoded image. A global 
vector representing camera motion between the first and 
second images is applied to the macroblocks in the second 
image. The global vectors map the macroblocks to a second 
pixel array in the first decoded image. Global residuals 
between the macroblocks and the second mapped image 
arrays in the first image are derived. When the global 
residuals are compared with the frame residuals to determine 
which macroblocks are classified as background and fore- 
ground. The macroblocks classified as foreground are then 
blended into a mosaic. 
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MOSAIC GENERATION AND SPRITE-BASED tation Method for Use Against Moving Objects to Lo, et al. 

CODING WITH AUTOMATIC FOREGROUND Temporal filtering is used for segmenting foreground objects 

AND BACKGROUND SEPARATION from background objects for the purpose of reconstructing 

image mosaics. This approach has two disadvantages: First, 

This application claims benefit of provisional application 5 { t requires that several frames be pre-acquired and stored so 

No. 60/041,777, filed Mar. 31, 1997. temporal filtering can be performed. Second, it does not 

BACKGROUND OF THE INVENTION explicitly produce a segmentation map, which can be used 

™ . . . t . 4 . , to refine motion estimates. 
This invention relates to mosaic generation and sprite- 
based coding, and more particularly, to sprite-based coding Analysis of motion residuals is described in U.S. Pat. No. 
with automatic foreground and background segmentation. 30 5,649,032, issued Jul. 15, 1997, entitled System for Auto- 
Throughout the document, the terms "sprite" and "mosaic" matically Aligning Images to Form a Mosaic Image, to Burt, 
will be used interchangeably. et aL ™* method separates foreground objects from back- 
Dynamic sprite-based coding can use object shape infer- 8 raund ob J ects *" a ™ sai f but do f " ot ^construct a mosaic 
mation to distinguish objects moving with respect to the ^ representative of the background object only (see desenp- 
dominant motion in the image from the rest of the objects in 15 tl0n 1D U me * eal time Emission section). Post-processing 
the image. Object segmentation may or may not be available must be used t0 elimiDate the foreground objects, 
before the video is encoded. Results of sprite-based coding Accordingly, a need remains for automatically performing 
with apriori object segmentation increases coding efficiency on-line segmentation and sprite building of a background 
at sufficiently high bit rates where segmentation image (object undergoing dominant motion) when prior 
information, via shape coding, can be transmitted. segmentation information is neither available nor used due 

When object segmentation is available and transmitted, t0 bandwidth limitations, 

sprite reconstruction uses the dominant motion of an object SUMMARY OF THE INVENTION 
(typically, a background object) in every video frame to 

initialize and update the content of the sprite in the encoder 25 Automatic object segmentation generates high quality 
and decoder. Coding efficiency improvements come from mosaic (panoramic) images and operates with the assump- 
scene re-visitation, uncovering of background, and global tion that each of the objects present in the video scene 
motion estimation. Coding gains also come from smaller exhibits dynamical modes which are distinct from the global 
transmitted residuals as global motion parameters offer motion induced by the camera. Image segmentation, gen- 
better prediction than local motion vectors in background 30 eration of a background mosaic and coding are all intricately 
areas. Less data is transmitted when a scene in revisited or linked. Image segmentation is progressively achieved in 
background is uncovered because the uncovered object time and based on the quality of prediction signal produced 
texture has already been observed and has already been by the background mosaic. Consequently, object segmenta- 
incorporated into the mosaic sometime in the past. The tion is embedded in the coder/decoder (codec) as opposed to 
encoder selects the mosaic content to predict uncovered 35 being a separate pre or post-processing module, reducing the 
background regions or other re-visited areas. Coding gains overall complexity and memory requirements of the system, 
come from the bits saved in not having to transmit local In the encoder, foreground and background objects are 
motion vectors for sprite predicted macroblocks. segmented by first encoding and decoding a first image at a 
However, the segmentation information may not be avail- first time reference. The method used to encode and decode 
able beforehand. Even when available, it may not be pos- 40 this first image does not need to be specified for the purpose 
sible to transmit segmentation information when the com- of this invention. The second image at a second time 
munication channel operates at low bit rates. Shape reference is divided into non-overlapping macroblocks 
information is frequently not available since only a small (tiles). The macroblocks are matched to image sample arrays 
amount of video material is produced with blue screen in the decoded first image or in the mosaic. In the first case, 
overlay modes. In these situations, it is not possible to 45 the encoder uses local motion vectors to align an individual 
distinguish among the various objects in each video frame. macroblock with one or several corresponding image sample 
Reconstruction of a sprite from a sequence of frames made array in the previous decoded image. In the second case, the 
of several video objects becomes less meaningful when each encoder uses parameters of a global motion model to align 
object in the sequence exhibits distinct motion dynamics. an individual macroblock with a corresponding mosaic 
However, it is desirable to use dynamic sprite -based coding 50 sample array. The encoder evaluates the various residuals 
to take advantage of the coding efficiency at high bit rates and selects the proper prediction signal to use according to 
and if possible, extend its performance at low bit rales as a pre-specified policy. This decision is captured in the 
well. Shape information takes a relatively larger portion of macroblock type. The macroblock types, the global motion 
the bandwidth at low bit rate. Thus, automatic segmentation parameters, the local motion vectors and the residual signals 
provides a relatively larger improvement in coding effi- 55 are transmitted to the decoder. 

ciency at low bit rates. Frame residuals represent the difference between the 
Current sprite-based coding in MPEG -4 assumes that macroblocks and corresponding image arrays in the previ- 
object segmentation is provided. With the help of segmen- ously decoded image matched by using local motion vec- 
tation maps, foreground objects are excluded from the tors. Macroblocks having a single local motion vector are 
process of building a background panoramic image. 60 identified as INTERlV-type macroblocks. Macroblocks 
However, the disadvantage of this approach is that object having multiple (4) local motion vectors are identified as 
segmentation must be performed beforehand. Object seg- INTER4V-type macroblocks. INTER4V macroblocks are 
mentation is a complex task and typically requires both always labeled as foreground. INTER IV macroblocks can 
spatial and temporal processing of the video to get reliable either be labeled foreground or background, 
results. 65 A global motion model representing camera motion 
Temporal linear or non-linear filtering is described in U.S. between the first and second image is applied lo the mac- 
Pat. No. 5,109,435, issued Apr. 28, 1992, entitled Segmen- roblocks in the second image. The global vector maps the 
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macroblocks to a corresponding second image sample array FIG. 6 is a step diagram showing how the automatic 

in the first decoded image. Global residuals between the segmentation is performed according to the invention, 

macroblocks and the second image array are derived. When piG. 7 is a step diagram showing how macroblocks in the 

the global residuals are greater than the INTER1V frame i mage f rame sn0 wn in FIG. 1 are classified as foreground 

residuals, the macroblocks are classified as foreground. 5 anc j background according to the invention. 

When the INTER1V frame residuals are greater than the FIG. 8 is a schematic representation showing how the 

global residuals, the macroblocks are classified as back- macroblocks are classified as foreground and background, 

ground. By comparing the global residuals to the INTER1V . 

frame residuals derived from the previously decoded image FIG - 9 15 a segmentation map and smoothed segmentation 
the mosaic can be automatically updated with the image 10 ma P accordul g to another °* invention, 
content of macroblocks likely to be background. FIG - 10 is a step diagram showing how pixels in back- 
Mosaic residuals represent the difference between the e rouad macroblocks are blended into a mosaic, 
macroblocks and corresponding global motion compensated DETAILED DESCRIPHON 
mosaic arrays. Any macroblocks tagged as mosaic predic- 
tion type are classified as background. ^ Referring to FIG. 1, automatic segmentation extracts a 
A segmentation map can be used to classify the macrob- background object 13, such as a hillside or a tree, from a 
locks as either foreground or background. A smoothing sequence of Rectangular-shaped video object planes (VOPs) 
process is applied to the segmentation map to make fore- 18. The VOPs 18 are alternatively referred to as frames or 
ground and background regions more homogeneous. The image frames. It is assumed that a previous decoded VOP 16 
mosaic is then updated with the contents of macroblocks 20 is available at time t-1. A current VOP 14 is available at time 
identified as background in the smoothed segmentation map. t. Terms used to describe automatic segmentation according 

Automatic segmentation does not require any additional 10 the invention is defined as follows, 

frame storage and works in a coding and in a non-coding G>k): Position of a macroblock 15 in the Video Object 

environments In a non-coding environment, the invention 25 Plane (VOP) 14 currently being encoded. The coordinates 

operates as an automatic segmentation -based mosaic image Q,k) represent the upper left corner of the macroblock 15. 

reconstruction encoder. Automatic object segmentation The size of a macroblock is B A xB v pixels, where B h is the 

builds a mosaic for an object exhibiting the most dominant horizontal dimension and B v is the vertical dimension of the 

motion in the video sequence by isolating the object from the macroblock, respectively. 

others in the video sequence and reconstructing a sprite for 3Q MBType(j,k): Macroblock type. This quantity takes the 

that object only. The sprite becomes more useable since it is value INTRA, INTER1V (one motion vector for the whole 

related to only one object. The results of the auto- macroblock), INTER4V (four motion vectors for each of the 

segmentation can be used to obtain more accurate estimates 8x8 blocks in the macroblock), MOSAIC, SKIP and 

of the dominant motion and prevent the motion of other TRANSPARENT. The INTRA macroblock type corre- 

objects in the video sequence from interfering with the 35 sponds to no prediction from the previous VOP 16 because 

dominant motion estimation process. there are no good matches between the macroblock 15 and 

Automatic object segmentation can be integrated into any any encoded/decoded 16x16 pixel image in VOP 16. INTRA 

block-based codec, in particular, into MPEG4 and is based macroblocks typically occur when new image areas appear 

on macroblock types and motion compensated residuals in VOP 14 that cannot be predicted. Instead of encoding the 

Dominant motion compensation is used with respect to the 40 differences between macroblock 15 and the best matched 

most recently decoded VO plane. A spatial coherency con- 16x16 pixel image in VOP 16, the macroblock 15 is encoded 

straint is enforced to maintain the uniformity of segmenta- by itself, (equivalent to using a prediction signal equal to 0) 

tion. Automatic segmentation is used in a non-coding Referring to FIG. 2, the INTER IV macroblock type 

environment, for example in the context of building a corresponds to a prediction from the previous decoded VOP 

background image mosaic only (or region undergoing domi- 45 16 at time t-1. In this case, a prediction signal is computed 

nant motion) in the existence of foreground objects. Thus, using one motion vector 17 to align the current macroblock 

automatic sprite-based segmentation is not only useful for 15 (j,k) with a 16x16 pixel array 18 in a previously encoded 

on-line dynamic sprites but can also be used in generating an VOP 16. The motion vector is the pixel distance that 

off-line (e.g., background) sprite that can be subsequently macroblock 15 is shifted from the (j,k) position in VOP 14 

used in static sprite coding. 50 to match up with a similar 16x16 pixel image in VOP 16. 

The foregoing and other objects, features and advantages The prediction signal is obtained by applying a local motion 

of the invention will become more readily apparent from the vectors to the current macroblock 15 that map to the 16x16 

following detailed description of a preferred embodiment of pixel image in the previous VOP 16. To reduce the amount 

the invention, which proceeds with reference to the accom- of data transmitted, only the macroblock motion vector and 

panying drawings, 55 residual arc transmitted instead of all pixel information in 

macroblock 15. Motion vectors move on cither a pixel or 

BRIEF DESCRIPTION OF THE DRAWINGS subpixel resolution with respect to the previous VOP 16. 

FIG. 1 is a diagram of an image frame divided into FIG * 3 shows , thc INTER4V maC roblock type that corrc- 

multiple macroblocks. sponds to a prediction computed using four motion vectors 

CTn - . . ixrrcmw „■ 60 i9 Eacn motion vector 19 aligns one sub-macroblock 21 

FIG. 2 is a diagram showing an INTER1V prediction ... o o • i 1ft ■ tl f • *r^m* r*t^ a 

moc j c & v wim an g x g p lxe j arrav 20 m the previous VOP 16. FIG. 4, 

_ t J, „ . t , . shows the MOSAIC macroblock type corresponding to a 

FIG. 3 is a diagram showing an INTER4V prediction predict ion made from the mosaic 22 updated last at time t-1. 
mode. 



A global motion model aligns the current macroblock 15 
FIG. 4 is a diagram showing a MOSAIC prediction mode. 65 w j m a 16x16 pixel array 24 in mosaic 22. The TRANS- 
FIG. 5 is a block diagram of an automatic segmentation PARENT macroblock mode relates to object based encoding 
encoder and decoder according to the invention. modes where a portion of an image is blocked out for 
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insertion of subsequent object data. The SKIP macroblock lock 15. A global residual computation unit 31 matches the 

mode is equivalent to MOSAIC macroblock mode for which macrob locks with pixel arrays in the previously decoded 

mosaic residual signal is equal to 0. VOP frame 16 according to frame global motion parameters 

Hie residuals generated from the global and various local and v B«c rates the global motion estimation residuals GMER 

motion models are compared. The macroblock is usually 5 0*)- Theglobal motion parameters are decoded by the 

a *u li I, ^ „,- t u * mA n A «t ~<.^„ 0 to decoder 47. An encoder 32 tags each macroblock as either 

tagged as the macroblock type with the smallest residuals. . mcda t> ■cxit a^o A fn o^td tmtd a 

T t *i_ ui i * i *• i j c ii TRANSPARENT or MOSAIC or SKIP or INTRA or 

However, the macroblock type selection could follow a IKTTrni ,, ixn , rn/l ,, . 

, «. . 4| _ . • • < INTERIV or INTER4V upon comparing the mosaic 

different policy without affecting the invention described residua , signal ^ ^ ^ loc / res £ uals si g nals 

nerein. . . in Encoder 32 also inserts the global motion parameters in the 

Define the various residuals that are used by this inven- 10 encoded bit stream. 
tion: The INTERIV or INTER4V prediction types are alterna- 

RES(j,k): The transmitted residual at the macroblock tively referred to as FRAME prediction types. The 

(j,k). This residual results from computing the difference foreground/background segmentation and mosaic update 

between the predictor (reference) image in either the unit 43 classifies macrob locks tagged as INTERIV predic- 

MOSAIC MBType(j, k)=MOSAIC or SKIP) or the previous 15 tion type as foreground when the global motion estimation 

frame type from VOP 16 (MBType(j,k)=INTER IV, or residuals GMER(j Jc) are greater than a portion (specified by 

INTER4V) and the data in the macroblock 15 depending on the value e ) of the INTERIV residuals RES(j,k). Otherwise, 

which macroblock type has been selected. The value of the INTERIV macroblocks are classified as background. 

RES(j,k) is 0 if the macroblock is of type INTRA. INTER4V macroblocks are classified as foreground. 

GMER(j,k): Global motion estimation residual The 20 M ^ C AI ^ 0S ^. C , and t SK1P macroblocks arc referred to as 

residual at the macroblock (\,k) resulting from backward M ^r A1L ' P rem <* 10 ° l yP es - 

warping the current macro block and comparing it with the ^ macroblocks are classified as background, 
previously decoded VOP 16. The warping is done using the ^ INTRA macroblocks are classified as foreground, 

transmitted and decoded global motion parameters (i.e. from „ r ^ mosaic u P date unit 4 ? identifies the background and 

a Stationary, Translation^ model, an AlEne model or a 25 f °I e § rou ? d macrob ^ cks and blends the macroblocks cl as- 

„ 4 * i -1 \ tt_ i u i m #* **™ r vii- smed as background into the mosaic 22. The encoder 32 can 
Perspective model). The global motion estimation residual is , , , 

*l u *: *L ui i ie j .1 , U1 then transmit an encoded bit stream including the global 

the difference between the macroblock 15 and the global t . t tt _ . , . , , 5- *• * 

A , . . . iL . , 7An T. T motion parameters, the tagged macroblock prediction type, 

motion compensated pixel array in the previous VOP 16. In ^ mot f Qn yectors associ ^ ed with lhe t F ed m&CTob ^ ck 

other words, the GMER(j,k) is the difference between the 3Q prediction type (if the macroblock typ e demands it), and the 
macroblock 15 and a corresponding pixel array in the residuals associated with the tagged macroblock prediction 
previous block 16 after removing the effects of camera type> A deco der 30 decodes the encoded bit stream to 
rotation, zoom, perspective angle changes, etc. Global generate the decoded previous frame 16. 
motion parameters are encoded and transmitted with each ^ decoder 35 includes a macroblock detector 38 that 
VOP 18. The calculation of GMER(j,k) is described in 35 reads the tagged macroblock prediction type in the trans- 
further detail in FIG. 8. mitted bit stream transmitted by encoder 25. The bitstream 

QP: The current value of the quantizer step used by the data is directed to the relevant decoder depending on the 

encoder to compress the texture residuals in the macroblock macroblock type. A frame decoder 37 uses the received 

(j,k). 60: A pre-defined threshold value greater or equal to residuals and portions of the previous decoded VOP 16 to 

1. This threshold value is a function of the quantizer step QP. 40 reconstruct INTERIV or INTER4V macroblocks. A mosaic 

W/): Forward warping operator. W^O : Backward warping decoder 45 uses the received residuals and portions of the 

operator, w: Vector of warping parameters specifying the mosaic 22 to reconstruct MOSAIC or SKIP macroblock 

mappings W^Q and W 6 ()- The vector w has zero, two, six or types. The macroblock decoder and assembler 39 takes the 

eight entries depending whether the warping is an identity, output of the frame decoder or the mosaic decoder as 

a tra relational, an affine or a perspective transformation, 45 appropriate. Neither of these two predictors is used for 

respectively, a: A pre-defined blending factor. Warping INTRA macroblocks and in this case decoder 39 decodes the 

operators compensate an image for changes in camera INTRA macroblock. A global residual computation unit 31 

perspective, such as rotation, zoom, etc. Implementation of receives the decoded global motion parameters associated 

warping operators is well known in the art and, therefore, is with the current frame. These global motion parameters are 

not described in further detail. 50 decoded by unit 47. 

FIG. 5 A shows functional blocks in an automatic seg- The residual signal and macroblock type used by decoder 
mentation encoder 25 and FIG. 5B shows functional blocks 39 are also passed to the foreground/background segmen- 
in an automatic segmentation decoder 35 according to the tation and mosaic update unit 49 to classify the macroblocks 
invention. A camera 26 generates VOPs 18 (see FIG. 1) and as foreground or background. The output of the global 
a macroblock separator 28 tiles the current VOP 14 into 55 residual computation unit 31 is also input to the mosaic 
multiple macroblocks 15. A frame predictor 29 matches each update unit 49. The exact same rules are used as in the 
individual macroblock 15 with pixel arrays in the previously encoder to derive the foreground/background segmentation 
encoded/decoded VOP frame 16 and generates frame (local) map . Specifically, decoded INTERIV prediction type mac- 
motion vectors and frame residuals associated with the roblocks are classified as foreground when the global motion 
macroblocks in the current VOP 14. Frame predictor 29 is 60 estimation residuals GMERQJc) are greater than the portion 
used for assessing INTERIV and INTER4V prediction. of the INTERIV residual RESg,k). Otherwise, the 

A mosaic predictor 33 matches the macroblocks 15 with assembled macroblocks are classified as background, 

pixel arrays in the mosaic 22 by using Global Motion Decoded INTRA and INTER4V macroblock types are clas- 

Pararaeters calculated by Global Motion Estimation and sified as foreground. MOSAIC and SKIP macroblocks are 

Encoding Unit 27. Such parameters are estimated using 65 classified as background. The mosaic update unit 49 updates 

original VOPs at time t and t-1 (41). The mosaic predictor the mosaic 22 with assembled macroblocks classified as 

33 produces mosaic residuals associated with each macrob- background. 
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FIG. 6 describes the overall operation of the automatic 
segmentation encoder 25 according to the invention. 
Step 1: Initialize sprite 

{ 0 otherwise 

|1 if W f 0r,fo)=l 
11 \ 0 otherwise 

10 

where S^O, S,0, VO/), v O/0 represent the sprite (mosaic) 
shape, the sprite texture, the decoded VOP shape 
(rectangular shaped VO here) and the decoded VOP texture 
fields, respectively. The sprite shape S^O and the decoded 
VOP shape VO^O are binary fields. In the sprite shape is 
image, the value 0 means that the mosaic content is not 
determined and the value 1 means the mosaic content is 
determined at this location. In the decoded VO shape image, 
values 0 and 1 mean that the decoded VO is not defined and 
defined at this location, respectively. Position vectors R and 20 
r represent the pixel position in the sprite and in the VO, 
respectively. 

The content of the mosaic 22 is initialized with the content 
of the first VOP 16. The shape of the sprite is initialized to 
1 over a region corresponding to the rectangular shape of 25 
VOP 16. The value 1 indicates that texture content has been 
loaded at this location in the mosaic. Instead of dumping the 
first VOP 16 into mosaic 22, an alternative initialization 
process is to initialize the buffers S/) and S,0 to 0 thereby 
delaying integration of VOP 14 content into the mosaic by 30 
one image. The benefit of such approach is to avoid taking 
foreground information in the first VOP to initialize the 
mosaic. The automatic segmentation mode discussed below 
is the implementation for any macroblock inserted into the 
mosaic 22. 35 
Step 2: Acquire next VOP (time t) and select macroblock 
type. 

The macroblocks 15 are backward warped W b Q and then 
matched with corresponding pixel arrays in mosaic 22. The 
difference between macroblock 15 and the mosaic 22 are the 40 
residuals for the MOSAIC macroblock type. The same 
backward mapping is used to record the residuals GEMRfj, 
k) obtained from the previous decoded VOP 16. The mac- 
roblock 15 is compared with similar sized pixel arrays in 
previous VOP 16. A macroblock local motion vector maps 45 
macroblock 15 to a pixel array in previous VOP 16 to derive 
INTER1V residuals. Four local motion vectors are used to 
derive residual values for the INTER4V macroblock type. 

If the residual values for MOSAIC, INTER1V and 
INTER4V are all greater than a predefined threshold, the 50 
macroblock 15 is assigned to MBType(j,k)«INTRA. If one 
or more of the residual values are below the threshold value, 
the macroblock 15 is assigned to the MBType(j,k) with the 
smallest frame or mosaic residual. Note that other policies 
can be implemented to select the macroblock type without 55 
affecting the invention described herein. 
Step 3: Encode and decode the VOP 

The encoder 25 encodes and decodes the VOP 14 at time 
(t). The bitstrcam representing the encoded VOP is trans- 
mitted to the decoder. The decoder 30 (FIG. 5 A) decodes the 60 
encoded bitstream to generate the decoded VOP 14. 
Step 4: Create binary map to detect macroblocks belonging 
to foreground 

Referring to FIG. 7 and 9, for every macroblock ((j,k) in 
the current decoded rectangular-shaped VOP 14, an object 65 
segmentation map g(j»k) 72 is built. The encoder 25 extracts 
a macroblock from the current VOP 14 in step 40. Decision 
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step 42 tests whether the macroblock is of type MOSAIC or 
SKIP. If the macroblock is of type MOSAIC or of a type 
SKIP, the segmentation map 72 is set to zero in step 44. 
if((MBTypeOOc)=»MOSAIC)||(MBType(j,k)oSKIP)). 

{gGJ0-0} 

If decision step 46 determines the macroblock is of type 
INTER4V or INTRA, the segmentation map is set to 1 in 
step 48. 

else if(MBType(j ,k)=INTER4 V) 
{sGVO-i} 

If the macroblock is not of types MOSAIC, INTER4V, 
INTRA or SKIP, the global motion estimation residual 
(obtained from applying the global motion parameters 
between the decoded VOP at time t and the decoded VOP at 
time t-1) is compared against the residual from the 
INTER IV macroblock type in decision step 50. If the global 
motion estimation residual is greater than some portion of 
the INTER1V residual (set by 6(QP)), the corresponding 
macroblock in segmentation map 72 is set to 1 in step 52. If 
the Global Motion Estimation Residual is not greater, the 
segmentation map is set to 0 in step 54. 

if( GMEROJc)>e(QP)RESQ,k)) 

{gG»=i} 

else 



The binary segmentation map 72 g(j,k) represents initial 
foreground/background segmentation. Detected foreground 
texture is denoted by setting g(j,k)-l. This is the case 
whenever the INTER4V macroblock occurs since it corre- 
sponds to the situation where there are four distinct and local 
motion vectors. In other words, the four different motion 
vectors indicate that the image in the macroblock is not 
background. INTRA macroblocks are also considered fore- 
ground (g(j,k)=l) because the macroblock cannot be pre- 
dicted from the previous decoded VOP or the mosaic. 
INTER IV are tagged as foreground when global motion 
estimation residual GMER(j,k) is larger than the portion of 
the (transmitted) INTER IV residual RES(j,k)* Io thi s 
situation, the global motion model does not correspond to 
the local dynamics of the foreground object, 

FIG. 8 explains in further detail how the encoder 25 (FIG. 
5A) distinguishes background from foreground in the 
INTER1V macroblocks The macroblock 15 in VOP 14 is 
determined by the encoder 25 to be of type INTER1V. 
Although macroblock 15 is encoded as INTER1 V type, it is 
not conclusive whether the INTER1V type was used 
because macroblock 15 contains a foreground image or 
because the mosaic 22 is cither corrupted with foreground 
content or has not completely incorporated that portion of 
background image contained in macroblock 15. 

The global motion parameters for VOP 14 are applied to 
macroblock 15 in box 58. The INTER1V local motion 
vector is applied to macroblock 15 in block 56. A pixel array 
55 corresponding to the global motion vector is compared to 
the macroblock 15 to generate the global motion estimation 
residual GMER(j,k) in block 62. The pixel array 18 corre- 
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sponding to the INTER1V local motion vector is compared value 1 and p in the range specified above the following 
to the macroblock 15 generating the INTER1V residual operation is performed. The pixels in the macroblock 15 are 
RES(j,k) in block 64. The global motion estimation residual tested in step 80 to determine whether the pixel belongs to 
GMERGW and the INTER IV residual RES(j,k) are com- the decoded VOP 16 and whether mosaic content at this 

pared in block 66. 5 pixel location is already determined, 

! r r^iSi v iT.f ' than if( (VO,(r,t)~l)& &(S,(R,t -)==!)) { 

portion of the INTER1 V residual RES(i,k), the image in the ~ > > s ; ' • i_ \u L ui i ie • 

macroblock 15 has its own motion and does not correspond Decision step 82 determines whether the macroblock 15 is 
to the global motion induced by panning, zooming, etc. of classified as a foreground macroblock. If the pixel in mac- 

the camera. Accordingly, the image in macroblock 15 is in roblock 15 15 ta ^ ed ™ foreground, the corresponding pixel 

tagged as foreground in block 68. Conversely, when the 30 arra y 10 mosaic 22 15 war P ed forward in ste P 84 but lts 
INTER1V residual RES(j,k) is greater than the global contents are not changed, 
residual GMER(j,k), the image in the macroblock 15 tagged if(h(j,k)~l) 
as background because it is likely to be new content in the 
background or a better representation of the background 

than what is currently in the mosaic 22.. The macroblocks 15 15 {s t (R,t)=w/s t (R,t-i) t w)} 
tagged as background are inserted into the mosaic 22. 

Step 5: Process segmentation map to make regions more If the macroblock is tagged as background, the mosaic is 
homogeneous forward warped and updated by blending the current content 

Step 5 (FIG. 6) removes any isolated Is or 0s in the binary of VOP 14 in step 86. 

segmentation map 72 gQ by using a two-dimensional sepa- 20 
rable or non-separable rank filter. The filter uses a neigh- 
borhood of macroblocks Q around a macroblock 74 of (5,(/? ( r)=Ci-a)W/^(/e,r-i),H')+aVO / (/;/)} 
interest at location (j,k). M specifies the number of macrob- 
locks in this neighborhood. The values of the segmentation where a specifies the blending factor. The shape of the 

map g0 for each of the macroblocks belonging to the 25 mosaic is set to 1 in step 92 to signal that mosaic content at 
neighborhood Q are ranked in increasing order in an array that location has been determined. 
A with M entries. 

Since g() can only take the value 0 or 1, A is an array of 
M bits where there are K zeros followed by (M-K) ones, K s,(R,t)=i 
being the number of times the map g() takes the value 0 in . 

theneighborhoodQ.Givenapre-fixedrankp,l^p^M,the 30 If the macroblock pixel belongs to the VOP 16, the 
output of the filter is selected as the p th entry in the array content of the mosaic 22 is undetermined (88), and the 
A, that is A[p]. The output of the filter at each macroblock macroblock is classified as background (89) the content of 
location (j,k) is used to generate a second segmentation map the mosaic is set to the content of the current pixel in the 
h(), such that h(j,k)~A[p]. The result of applying the filter to VOP 14 in step 90 and the mosaic shape is set to 1 in step 

the segmentation map g() is removal of spurious l's or O's 35 92. 

in the initial segmentation, thereby making it more spatially else if((VO f (r,t)=«l)& &(S^(R,t-l)==0)) 
homogeneous. If the filter is separable, the filtering opera- 
tion above repeated along each dimension (horizontally then 

vertically or vice versa). At the end of the first pass, the {if(h(j,k)~ 0){s,(R,t)-voXr,t) s.tR.t)-!}} 

output map hQ is copied to the map g() before the second 40 

pass is started After all pixels in the current macroblock 15 have been 

Referring to FIG. 9, the number M of macroblocks in the processed in decision step 93, step 94 gets the next mac- 
neighborhood is 9. For the target macroblock 74, the array roblock. Otherwise, the next pixel is retrieved in step 78 and 
A has 9 entries with 8 zeros in macroblocks g(32,0), g(48,0), the process described above is repeated. 

g(64,0), g(32,16), g(64,16), g(32,32), g(48,32) and g(64,32) 45 Ste P 7: Acquire next VOP 

followed by a 1 at macroblock g(48,16) (assuming a mac- ^ encoder 26 goes back to step 2 (FIG. 6) to start the 
roblock size of 16 pixels vertically and horizontally). Pre- same procedure for the next VOP at time t-t+1. 
fixed rank p is set at 7 and the output of the filter at the 7th Automatic Segmentation in a Non-Coding Environment 
entry in the array A is 0. The filtered output of the macrob- ^ automatic segmentation described above can also be 

lock 74 is, therefore, zero. A second filtered segmentation 50 used 111 a non-coding environment. In this case, the mac- 
map 76 is generated from the filtered segmentation map 72. roblock sizes B A and B v are no longer imposed by the video 
Step 6: Update mosaic according to new segmentation map coder 26 and are adjusted based on other criteria such as 
Referring to FIG. 10, for every macroblock QM) in the ima S e resolution and object shape complexity. In this case, 
current VOP 14 at time (t), the mosaic 22 is updated as block-based image processing provides increased robustness 

follows. First, the mosaic shape at time t, S, (R, t), is equal 55 in the segmentation by preventing spurious local motion 
to 0 everywhere. Next, given a macroblock position (j,k), let modes t0 be interpreted as global motion of the background 
fl3 object. Furthermore, the value of the threshold 80 is no 

longer a function of a quantizer step but instead becomes a 
function of the noise level in the video 
60 The automatic segmentation for on-line sprite-based cod- 
ing is used in MPEG -4 codecs supporting on-line sprite 
prediction. It can also be used in digital cameras and 
where the variables 1 and p are such that 0iliB A -l and camcorders to generate panoramic images. These panoramic 
0^piB v -l. The variables j+1 and k+p are used to denote images can be used to enhance consumer viewing experi- 

the position of each pixel within the macroblock (j,k). 65 ence (with or without foreground objects) and can also be 
The first macroblock is referenced in step 77 and the first used as representative images in a consumer video database 
pixel in the macroblock is retrieved in step 78. For every (to summarize a video segment that includes camera 



k + p 
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panning, for example). It can be used as a basis for an image 
resolution enhancement system in digital cameras as well. In 
this case, a warping operation is designed to include a 
zooming parameter that matches the desired final resolution 
of the mosaic. 5 

Having described and illustrated the principles of the 
invention in a preferred embodiment thereof, it should be 
apparent that the invention can be modified in arrangement 
and detail without departing from such principles. I claim all 
modifications and variation coming within the spirit and 10 
scope of the following claims. 

What is claimed is: 

1. A method for automatically segmenting foreground and 
background objects in images, comprising: 

encoding and decoding a first image at a first time 15 
reference; 

extracting macroblocks from a second image at a second 

time reference; 
mapping the macroblocks with corresponding arrays in 

the decoded first image according to a macroblock local 20 

vector; 

deriving frame residuals between the macroblocks and the 

corresponding arrays; 
mapping macroblocks to the first image according to a 2 $ 

global motion model; 
deriving global residuals between the macroblocks and 

the corresponding global motion compensated array in 

the first image; 

tagging the macroblocks as a frame prediction type based 30 

on one local motion vector; 
classifying the macroblocks as foreground or background 

by comparing the global residuals with the derived 

frame residuals; 
classifying the macroblocks as foreground when the glo- 35 

bal residuals are greater than a function of the frame 

residuals; 

classifying the macroblocks as background when the 
frame residuals are greater than some function of the 
global residuals; 

updating the mosaic with macroblocks tagged as back- 
ground; 

creating a segmentation map that identifies the macrob- 
locks in the second image as either foreground or 45 
background; 

smoothing the segmentation map to remove extraneous 
foreground and background macroblocks in the seg- 
mentation map, wherein smoothing the segmentation 
map includes: 50 
taking macroblock neighbors around a target macrob- 
lock in the segmentation map; 
taking the segmentation map values for each of the 

macroblock neighbors and the target macroblock; 
ranking the segmentation map values in increasing 55 
order; and 

selecting the output of the target macroblock as the 
value of the ranked neighbor at a selected threshold; 

updating the mosaic with the identified background mac- 
roblocks in the smoothed segmentation map; 60 

forward warping the mosaic but not changing the contents 
of the mosaic when all of the following conditions 
occur: 

pixel sample values in the macroblocks belong to a 
decoded video object plane; 65 

the mosaic content is already determined at the pixel 
locations; and 



40 



the macroblock is labeled as foreground; 
forward warping the mosaic and blending the pixel 
sample values into the mosaic when all of the following 
conditions occur: 

pixels in the macroblock belong to the decoded video 
object plane; 

the mosaic content is already determined at the pixel 

locations; and 
the macroblock is labeled as background; 
forward warping and updating the mosaic content with 
content of the pixel sample values in the decoded 
second image when all of the following conditions 
occur: 

pixels in the macroblock belong to the decoded second 
image; 

the mosaic content is undetermined at the pixel loca- 
tions; and 

the macroblock is labeled as background; and 
initializing the mosaic by either inserting the first decoded 
image into the mosaic and then updating the mosaic in 
time only with macroblocks classified as background or 
setting a mosaic buffer to zero everywhere and then 
incrementally updating the mosaic in time only with 
macroblocks classified as background. 

2. A method for automatically segmenting foreground and 
background objects in images, comprising: 

encoding and decoding a first image at a first time 
reference; 

extracting macroblocks from a second image at a second 

time reference; 
mapping the macroblocks with corresponding arrays in 

the decoded first image according to a macroblock local 

vector; 

deriving frame residuals between the macroblocks and the 

corresponding arrays; 
mapping macroblocks to the first image according to a 

global motion; 
deriving global residuals between the macroblocks and 

the corresponding global motion compensated array in 

the first image; 
classifying the macroblocks as foreground or background 

by comparing the global residuals with the derived 

frame residuals; and 
initializing a mosaic by inserting the first decoded image 

into the mosaic and then updating the mosaic in time 

only with macroblocks classified as background. 

3. Computer code stored on a computer readable medium, 
comprising: 

code to separate a frame into multiple macroblocks; 

code to match the macroblocks with pixel arrays in a 
previously decoded frame according to local motion 
vectors and generating frame residuals associated with 
the macroblocks; 

code to match the macroblocks with pixel arrays in the 
previously decoded frame according to frame global 
motion parameters to generate a global residual; 

code to classify the frame prediction type macroblocks as 
either foreground or background by comparing the 
global residuals with the frame residuals; and 

code to initialize a mosaic by either inserting the frame 
into the mosaic and then updating the mosaic in time 
only with macroblocks classified as background or 
setting a mosaic buffer to zero everywhere and then 
incrementally updating the mosaic in time only with 
macroblocks classified as background. 
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4. A decoder, comprising: 

a macroblock detector detecting a tagged macroblock 
prediction type in an encoded bit stream; 

a frame decoder reconstructing macroblocks tagged as a 
frame prediction type according to a previously 5 
decoded frame according to frame residuals; 

a mosaic decoder reconstructing macroblocks tagged as a 
mosaic prediction type according to a mosaic; 

a global decoder receiving global residuals for a current 1Q 
frame and classifying assembled frame prediction type 
macroblocks as foreground when global residuals are 
greater than a function of the macroblock frame residu- 
als and otherwise classifying the assembled macrob- 
locks as background; and 15 

a mosaic updater to initialize the mosaic by either insert- 
ing a decoded frame into the mosaic and then updating 
the mosaic in time only with reconstructed macrob- 
locks classified as background or sing a mosaic buffer 
to zero everywhere and then incrementally updating the 2 o 
mosaic in time only with macroblocks classified as 
background. 

5. A decoder, comprising: 

a macroblock detector detecting a tagged macroblock 
prediction type in an encoded bit stream; 25 

a frame decoder reconstructing macroblocks tagged as a 
frame prediction type according to a previously 
decoded frame according to frame residuals; 

a mosaic decoder reconstructing macroblocks tagged as a 
mosaic prediction type according to a mosaic; 30 

a global decoder receiver global residuals for a current 
frame and classifying assembled frame prediction type 
macroblocks as foreground when global residuals are 
greater than a function of the macroblock frame residu- 
als and otherwise classifying the assembled macrob- 35 
locks as background; 

a mosaic updater to initialize the mosaic by either insert- 
ing a decoded frame into the mosaic and then updating 
the mosaic in time only with reconstructed macrob- 4Q 
locks classified as background or setting a mosaic 
buffer to zero everywhere and then incrementally 
updating the mosaic in time only with macroblocks 
classified as background; 

wherein the macroblock decoder tags the macroblocks 45 
according to the following: 

single local motion vector frame prediction type for the 
macroblocks having a single local motion vector; 

multiple local motion vector frame prediction type for 
the macroblocks having multiple local motion vec- 50 
tors; 

MOSAIC prediction type for the macroblocks pre- 
dicted according to global motion parameters; 

SKIP prediction type corresponding to a MOSAIC 
prediction type for which the residual signal is zero; 5S 

INTRA prediction type for the macroblocks not using 
any prediction; and 

the macroblock encoder classifying multiple local 
motion vector and INTRA prediction types as 
foreground, and the MOSAIC and SKIP prediction 60 
types as background. 

6. A method for automatically segmenting foreground and 
background objects in images, comprising: 

encoding and decoding a first images at a first time 
reference; 65 

extracting macroblocks from a second image at a second 
time reference; 
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mapping the macroblock with corresponding arrays in the 
decoded first image according to a macroblock local 
vector representing frame to frame dynamics of the 
objects in the images; 

deriving frame residuals between the macroblocks and the 
corresponding arrays in the first image representing the 
difference between the macroblocks and corresponding 
arrays in the first image matched by the local vector; 

mapping macroblocks to the first image according to a 
global motion induced by a system receiving the 
images; 

deriving global residuals between the macroblocks and 
the corresponding global motion compensated array in 
the first image representing the difference between the 
macroblocks and corresponding arrays in the first 
image matched by the global motion; 
tagging the macroblocks as different frame prediction 
types based on a particular type of local vector used for 
mapping the macroblocks to the first image; 
classifying the macroblocks as foreground or background 
by comparing the global residuals with the derived 
frame residuals; and 
updating a mosaic in time only with macroblocks classi- 
fied as background. 
7. An automatic segmentation system for mosaic based 
encoding, comprising: 

a macroblock separator separating a frame into multiple 
macroblocks; 

a frame predictor matching the macroblocks with pixel 
arrays in a previously decoded frame according to local 
motion vectors representing frame to frame dynamics 
of images in the frame and generating frame residuals 
associated with the macroblocks representing the dif- 
ference between the macroblocks and the pixel arrays 
in the previously decoded frame matched by the local 
motion vectors; 
a mosaic predictor matching the macroblocks with pixel 
arrays in a mosaic according to global motion param- 
eters and generating residuals associated with the mac- 
roblocks; 

a global motion predictor matching the macroblocks with 
pixel arrays in the previously decoded frame according 
to frame global motion parameters representing image 
movement induced by a system receiving the images 
and generating a global residual representing the dif- 
ference between the macroblocks and the pixel arrays 
in the previously decoded frame matched by the global 
motion parameters; 
a macroblock encoder tagging the macroblocks as mosaic 
prediction type when the mosaic residuals arc used for 
encoding the macroblocks, and tagging the macrob- 
locks as frame prediction type based on a local motion 
vector when the frame residuals arc used for encoding 
the macroblocks, the macroblock encoder classifying 
the frame prediction type macroblocks as either fore- 
ground or background by comparing the global residu- 
als with the frame residuals and classifying the mosaic 
prediction type as background; and 
a mosaic updater that updates the mosaic in time only with 
macroblocks classified as background. 

* * * * * 
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